On the Size of Training Set and the Benefit from Ensemble

نویسندگان

  • Zhi-Hua Zhou
  • Dan Wei
  • Gang Li
  • Honghua Dai
چکیده

In this paper, the impact of the size of the training set on the benefit from ensemble, i.e. the gains obtained by employing ensemble learning paradigms, is empirically studied. Experiments on Bagged/ Boosted J4.8 decision trees with/without pruning show that enlarging the training set tends to improve the benefit from Boosting but does not significantly impact the benefit from Bagging. This phenomenon is then explained from the view of bias-variance reduction. Moreover, it is shown that even for Boosting, the benefit does not always increase consistently along with the increase of the training set size since single learners sometimes may learn relatively more from additional training data that are randomly provided than ensembles do. Furthermore, it is observed that the benefit from ensemble of unpruned decision trees is usually bigger than that from ensemble of pruned decision trees. This phenomenon is then explained from the view of error-ambiguity balance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search

In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...

متن کامل

Distance Dependent Localization Approach in Oil Reservoir History Matching: A Comparative Study

To perform any economic management of a petroleum reservoir in real time, a predictable and/or updateable model of reservoir along with uncertainty estimation ability is required. One relatively recent method is a sequential Monte Carlo implementation of the Kalman filter: the Ensemble Kalman Filter (EnKF). The EnKF not only estimate uncertain parameters but also provide a recursive estimat...

متن کامل

Ensemble strategies to build neural network to facilitate decision making

There are three major strategies to form neural network ensembles. The simplest one is the Cross Validation strategy in which all members are trained with the same training data. Bagging and boosting strategies pro-duce perturbed sample from training data. This paper provides an ideal model based on two important factors: activation function and number of neurons in the hidden layer and based u...

متن کامل

The Application of Multi-Layer Artificial Neural Networks in Speckle Reduction (Methodology)

Optical Coherence Tomography (OCT) uses the spatial and temporal coherence properties of optical waves backscattered from a tissue sample to form an image. An inherent characteristic of coherent imaging is the presence of speckle noise. In this study we use a new ensemble framework which is a combination of several Multi-Layer Perceptron (MLP) neural networks to denoise OCT images. The noise is...

متن کامل

Optimum Ensemble Classification for Fully Polarimetric SAR Data Using Global-Local Classification Approach

In this paper, a proposed ensemble classification for fully polarimetric synthetic aperture radar (PolSAR) data using a global-local classification approach is presented. In the first step, to perform the global classification, the training feature space is divided into a specified number of clusters. In the next step to carry out the local classification over each of these clusters, which cont...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004